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Abstract 

We  discuss  the  relationship  between  the  norm  of  the  local  discrete  least  squares  poly- 
nomial approximation  operator,  the  minimal  singular  value  o'lnjri(P=)  of  the  matrix  P= 
of  the  evaluations  of  the  basis  polynomials,  and  the  norming  constant  of  the  set  of  data 
points  S with  respect  to  the  space  of  polynomials.  Since  these  three  quantities  are  equi- 
valent up  to  bounded  constants,  and  since  0min(-Ps)  can  be  efficiently  computed,  it  is 
feasible  to  use  cmin(P=)  as  a tool  for  distinguishing  good  local  point  constellations,  which 
is  useful  for  scattered  data  fitting.  In  addition,  we  give  a simple  new  proof  of  a bound 
by  Reimer  for  the  norm  of  the  interpolation  operators  on  the  sphere  and  extend  it  to 
discrete  least  squares  operators. 


1 Introduction 

Let  Q be  a bounded  domain  in  Rd,  d > 1,  and  let  S = {£1, . . . , £m}  be  a set  of  scattered 
points  in  Cl.  Given  the  values  / 1=  = (/(£  1), . . . , /(£m))r of  an  otherwise  unknown  function 
f : Cl  —*  1R,  we  want  to  reconstruct  / from  these  data.  The  least  squares  method  consists 
in  choosing  some  linear  independent  functions  pi,...,pn  on  Cl,  n < m,  and  computing 
the  coefficients  Oi,  • . . , an  € 1R  that  minimize  the  i 2 norm  of  the  residual  on  E, 

pi  1 /2 

||/Ie  -plnlh  = (]T]l/(6)  -p(£OI2)  , 

»=i 

with  p = aipi  H h anpn  € V :=  spanfp!,. . . ,pn}.  Let  V\-=  :=  span{pi|H,  • • • ,pn|s}- 

If  dimP|=  = n,  then  the  least  squares  solution  is  unique,  and  we  denote  it  by  L-p^f. 
Note  that  the  minimum  norm  solution  available  in  the  case  of  a rank  deficient  problem 
(dim'P|n  < n)  seems  less  useful  since  in  general  it  does  not  reproduce  the  elements  of  V 
exactly. 

The  computation  of  least  squares  approximation  L-p^f  of  / is  expensive  if  rn  and 
n are  large.  To  obtain  a scattered  data  fitting  algorithm  with  linear  complexity  with 
respect  to  the  size  of  data,  a two-stage  method  [8]  can  be  employed  which  consists  in 
1)  covering  the  original  domain  Cl  with  a number  of  subdomains  Clh  each  containing 
only  a small  subset  S*,  = E fi  ST/,-  of  E,  computing  local  approximations  to  the  data  in 
Eft,  and  2)  using  the  information  obtained  from  these  local  approximations  to  build  the 
final  approximation  of  the  (possibly  huge)  original  data  set.  The  least  squares  method 


346 


Approximation  power  of  local  least  squares 


347 


can  be  employed  in  the  local  approximation  stage,  especially  to  deal  with  “real  world” 
data  usually  contaminated  with  errors  or  just  containing  undesirable  “high  frequency” 
components. 

If  V is  chosen  to  be  the  space  11^  of  algebraic  polynomials  in  d variables  of  a suitable 
degree  q,  then  n = (d^q)  • To  achieve  high  approximation  order,  it  is  desirable  to  choose 
q such  that  n is  only  a little  smaller  then  m.  However,  this  is  not  always  possible  due  to 
the  rank  deficiency  or  ill-conditioning  of  the  least  squares  problem,  which  is  especially 
difficult  to  control  if  £1, . . . ,£m  € S*,  are  unevenly  distributed  in  f\..  This  difficulty  can 
in  principle  be  overcome  by  constructing,  for  each  S a suitable  subspace  of  higher 
degree  polynomials  (least  interpolation  space  [2]).  If,  however,  the  polynomial  degree 
is  not  allowed  to  exceed  a fixed  small  value,  then  a common  practical  approach  is  to 
choose  larger  sets  S*,  C E,  with  m substantially  greater  than  n,  see  e.g.  [4]  where  it  is 
suggested  to  use  for  local  least  squares  approximation  m — 11  points  if  V = n§  with 
n = 6 and  m = 15  points  if  V = n§  with  n = 10.  However,  even  these  higher  m provide 
no  guaranty  that  the  matrix 

Psk  ■=  [Pj(6)  : i = j = 1, ...,n] 

of  the  local  least  squares  problem  will  always  be  well-conditioned.  Moreover,  for  some 
data,  this  method  may  lead  to  the  use  of  inappropriately  distant  points  for  the  local 
approximation. 

The  purpose  of  this  paper  is  to  draw  attention  to  the  fact  that  the  conditioning  of 
the  matrix  is  not  only  the  issue  of  numerical  stability  of  the  computation  of  least 
squares.  Indeed,  the  reciprocal  of  the  minimal  singular  value  crmjn(Ps)  of  Ph  provides 
a bound  for  the  norm  of  the  least  squares  operator  Lj>=  if  both  m and  n are  small. 
Therefore,  the  approximation  power  of  local  least  squares  depends  on  crmin(-Fk)  and 
the  best  approximation  from  V.  Since  crmjn(Ps)  can  be  efficiently  computed  for  a small 
matrix  P=  by  well  known  numerical  algorithms,  it  is  feasible  to  use  it  as  a tool  to 
decide  whether  a particular  portion  of  data  is  suitable  for  building  local  least  squares 
approximation  from  V with  reasonable  approximation  power.  If  crm;n(P~)  is  too  small, 
then  either  S or  V should  be  modified,  e.g.  by  adding  more  points  to  S or  using  an 
appropriate  subspace  of  V.  A two-stage  algorithm  for  fitting  large  irregularly  distributed 
scattered  data  sets  employing  the  conditioning  of  the  local  observation  matrices  P=k  is 
studied  in  [3,  5]. 

The  paper  is  organized  as  follows.  In  Section  2 we  discuss  the  relationship  between 
the  norm  of  the  discrete  least  squares  approximation  operator,  the  minimal  singular 
value  <rmin(P=),  and  the  norming  constant  f(V.  E).  As  a by-product,  we  obtain  a new 
proof  of  a known  bound  for  the  norm  of  the  interpolation  operators  on  the  sphere  [7], 
and  extend  it  to  the  discrete  least  squares  operators.  Section  3 illustrates  the  above 
concepts  in  the  univariate  case,  when  they  are  also  related  to  the  separation  distance  of 
S,  while  Section  4 is  devoted  to  a discussion  of  the  least  squares  multivariate  polynomial 
approximation. 
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2 Bounds  for  ||Lp=|[  and  approximation  error 

Let  pi , . . . , pn  be  linearly  independent  continuous  functions  on  D C Rrf  spanning  a linear 
space  V.  Since  all  norms  on  a finite  dimensional  linear  space  are  equivalent,  there  are 
positive  constants  K\ , K2  such  that 

n 

*iIH|2<  ^ A2IMI2  (2.1) 

J=:1  C(Q) 

for  any  coefficient  vector  a — (ai, . . . , an)T  £ R". 

Given  S = {£1, . . . ,£m}  C fl,  we  consider  the  matrix  P~  E Rmx"  as  defined  in  the 
introduction.  Obviously,  rank  P-=  — dim  V\-=.  If  P=  has  full  rank,  then  dim  V\s  = n, 
and  the  least  squares  approximation  Lp=/  is  uniquely  determined,  giving  rise  to  the 
operator  L-p  = : G(Q)  —>  P C.  C(Q). 

It  is  easy  to  see  that  L-pt  = exactly  reproduces  the  elements  of  V,  i.e., 

Lv,=.P  = P , ah  p &P-  (2.2) 

Therefore,  a standard  argument  shows  that 

\\f-Lv,Ef\\cW<(l  + \\LvM)E(f,P)c(n),  (2.3) 

where  E(f,  P)c(Q)  denotes  the  error  of  the  best  approximation  of  / from  V in  Chebyshev 
norm, 

E(f,P)c(n)  ■=  inf  ||/ -p||c(0). 

pEP 

Thus,  an  estimate  for  ||Lp,e||  immediately  gives  an  upper  bound  for  ||/  - Lp^fWan)- 
The  norming  constant  v(P,  E)  of  E with  respect  to  V [6]  can  be  defined  by 

i/(P,S)  =min||p|E||0O/||p||c(n).  (2.4) 

pEP 

Given  any  matrix  A,  we  denote  by  crmjn(j4)  the  minimal  singular  value 

<h™n(4)  = min  JAr]|2. 

Recall  that  if  A has  full  rank,  then  amjn(24)  = ||j4+||J1,  where  A+  is  the  pseudoinverse 
of  A,  see  e.g.  [1]. 

Theorem  2.1  If  rank  P=  = n,  then 

Ki/am-m(P=)  < \\Lv,e\\  < K2V^/vmin(Pz),  (2.5) 

1 MP,E)<  \\LV*\\  <V^MP,  2),  (2.6) 

Kiv(P,E)<  amin(P3)  <K2yfriv{V,~).  (2.7) 

Proof:  We  first  prove  (2.5).  Let  L-p=f  = J2j=i  ajPj ■ H follows  by  a well-known  result 
in  numerical  linear  algebra  that  the  vector  a = (ai, . . . ,a„)T  can  be  computed  as  the 
product  of  the  pseudoinverse  P£  of  P=  with  the  vector  /|=.  Therefore, 

IMI2  = I|pe+/|h||2  < ||Pe+||2||/|h||2  = a-|n(J%)||/|s||2. 
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Since  ||Lp,E/||c(n)  < Fr2|H|2  and  ||/|5||2  < V™  II/Ie||oo  < \MWfWc(.n), -the  upper 
bound  in  (2.5)  follows.  To  prove  the  lower  bound  in  (2.5),  we  choose  a function  / 6 C(Q) 
such  that 

\\pifkh  = iiph+H2||/Ih||2,  wMioo  = ii/lic^, 

which  is  obviously  possible.  Then  by  (2.1)  we  have 

, \\Lv,sf\\c(Si)  > /|e||2  = iflCr“jn(PH)||/|H||2) 

which  implies  the  desired  lower  bound  since  ||/|e  ||2  > ll/lsiloo  = ||/||c(n)- 

Since  ||Pp,s/||c(n)  < £)||(Pp,s/)|e||oo,  the  upper  bound  in  (2.6)  follows  by 

||(^,e/)|e||oo  < ||(^>s/)|e||2  < \\f\sh  < Vm\\f\\c(u)- 

To  prove  the  lower  bound,  we  denote  by  p an  element  of  V for  which  the  minimum  in  (2.4) 
is  attained  and  choose  a function  / € C(Q)  such  that  /|h  =p|h  and  ||/||c(n)  — ||/|s||oo- 
Then  by  (2.2), 

\\Lv,Ef\\c(n)  — Ilp||c(f2)  = ^CP^HpIeIIoo  — v~1{'P,=‘)\\f\\c(n)> 

which  implies  HLp^ll  > ^_1('P,S). 

We  finally  establish  (2.7).  For  any  p 6 V,  let  p = ^"=1  ajPj  and  a = (ai, . . . , an)T. 
Then  p\s  — P=a  and  hence 

Iblslloo  < \\Psa\\2  < v^lblslloo- 

Since 

o-min(^s)  = min i ||PHa||2/||a||2, 

oGRn 

(2.7)  follows  by  (2.1).  □ 

In  view  of  (2.3),  the  upper  bound  in  (2.5)  implies 

11/  - LppfWcm  < (1  + K^/amin{P3))E{f,V)cm,  (2.8) 

which  shows  that  the  approximation  power  of  discrete  least  squares  proportionally  re- 
duces if  <7min(-fk)  (or  i/(P,H))  is  small.  We  will  discuss  some  practical  consequences  of 
this  fact  in  the  next  two  sections. 

Although  u(V,E)  gives  tighter  bounds  for  ||Lp=||,  Umir/Fk)  has  a clear  practical 
advantage  that  it  is  easily  computable  by  using  e.g.  the  singular  value  decomposition  of 
the  small  “local”  matrix  P=.  On  the  other  hand,  the  norming  constants  were  used  in  [6,  9] 
to  derive  estimates  for  the  approximation  error  of  radial  basis  function  interpolation  and 
moving  least  squares,  respectively. 

Remark  2.2  If  pi,...,pn  is  an  orthonormal  basis  for  V,  then  ||a||2  = ||p|[x,2 (f2) , V = 
]P”=1  and  the  constants  Ki,  K2  in  (2.1)  are  closely  related  to  Nikolskii  constants 
of  the  space  P,  namely, 

Ki  = K2  = Noc,2(V), 

NquviP)  ~ mf%\\p\\Lqi(n)/\\p\\Ln(Q),  1 < 91,92  < OO. 
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In  particular,  if  Q = Sd  *,  the  unit  sphere  in  and  {pi,  ■ ■ ■ ,pn}  is  the  set  of  spherical 
harmonics  forming  an  orthonormal  basis  for  the  space  V = Hdq  of  spherical  polynomials 
of  degree  q in  d variables,  then  it  is  not  difficult  to  prove  that  K2  = Aroc.2('b'd)  = 
\/n/\Sd~l\,  where  |Sd_1|  denotes  the  surface  area  of  Sd~l . Therefore,  for  any  set  E C 
Sd~l  with  #E  = m > n,  we  have  by  (2.5), 

HL^hII  < y/nm/\S*-'\/amM)}  (2.9) 

which  recovers  in  the  case  of  interpolation  (m  = n)  an  error  bound  by  Reimer  [7] 
originally  proved  by  using  Lagrangian  square  sums  (see  also  [10]). 

3 Univariate  polynomials 

Let  ft  be  an  interval  [—h,  h)  on  the  real  line  IR,  and  let 

Then  V is  the  restriction  to  [— h,h]  of  the  space  n*_j  of  all  univariate  polynomials 
of  degree  at  most  n - 1.  By  the  well-known  interpolation  properties  of  the  univariate 
polynomials,  rank  P=  =n  for  any  E = {£i, . . . , £m}  C [—h,  h],  m > n,  with  distinct  £,;’s. 
For  any  S'  = , . . . , 6„  } C E,  let  q-=<  denote  the  separation  distance, 

<?=' ;=  ~ min  - &J. 

The  Lebesgue  constant  ||Lt>,h'||  of  the  corresponding  interpolation  scheme  can  be  easily 
estimated  as 

Since  S'  may  be  any  subset  of  S of  cardinality  n and  since  v(V,  E)  > ||Lp,H'||-1!  we  get 


where 


Hence,  by  (2.3)  and  (2.6), 


95, n ;=  max  qs>- 

#H'=n 


This  last  estimate  shows  that  the  univariate  least  squares  polynomials  have  the 
approximation  power  of  the  best  local  polynomial  approximation  as  h — > 0 provided 
h/q~}n  remains  bounded.  However,  if  the  scattered  points  (i,...,6  € [— h,h]  are 
clustered  together  in  at  most  n — 1 very  tight  groups,  then  qs,n  may  be  arbitrarily 
small,  thus  forcing  the  right  hand  side  of  (3.1)  to  blow  up.  To  figure  out  what  happens 
to  ||/  - Ln i s/||c[-/»,fc]  m these  circumstances,  we  consider  the  following  example. 
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Let  h = 1,  n = 2,  f(t ) = t2  - 1/2,  and  E = {-£,0,£}  for  some  0 < £ < 1.  It  is  easy 
to  see  that  LniH/  = —1/2  + 2£2/3.  Since  E(f,  ni)c[-i,i]  — 1/2,  we  have 

11/  - £ni,s/||c[-i,i]  = 1/2  + |l/2  - 2£2/3|  < 2JS(/,n})c[_1>1] 
even  though,  by  a simple  calculation, 

11^=11-1/3  + 1/e, 

V^/crmin(PE)  = l/v(V,  S)  = l/q~,2  = l/£  -*  oo  as  £ 0. 

This  may  contribute  to  the  opinion  that  HLn1,-!!,  'Anin^s),  v(fP,a)  and  qE>n  are  not 
the  right  quantities  to  describe  the  behaviour  of  the  approximation.  Indeed,  as  the 
three  points  — £,0,£  coalesce,  Lni  Ef  converges  to  a Hermite  interpolation  polynomial 
provided  the  entries  of  P=  as  well  as  the  values  of  /|=  are  exact.  However,  if  we  simu- 
late “real  world”  data  by  adding  to  /(— £),/(0),/(£)  normally  distributed  errors  with 
standard  deviation  10~4,  then  the  picture  substantially  changes.  Table  1 shows  that 
11/  ~ -^nb  ,s/llc[-i,i]  does  blow  up  in  this  case.  For  comparison  we  also  include  in  the 
table  the  error  of  ||/  — LiiJ,e/IIc[-i,i]  f°r  the  same  contaminated  data. 

Tab.  1 Average  (dmean)  and  maximum  (d^jax)  of  ||/  - £n[,s/llc[-i,i]  as  well  as 
maximum  of  ||/  - Lnj,s/llc[-r,x]  (^max)  in  1000  tests  with  contaminated  data 


* 

nr3 

10-4 

HT5 

10“6 

io-7 

10~8 

dl 

“mean 

1.06 

1.56 

6.63 

57.3 

564 

5630 

d1 

umax 

1.24 

3.39 

24.9 

240 

2390 

23900 

d° 

umax 

1.00018 

1.00018 

1.00018 

1.00018 

1.00018 

1.00018 

Thus,  if  qs,n  is  too  small,  we  cannot  practically  achieve  with  least  squares  the  ap- 
proximation order  of  E(f , n*  _i)c[-h,h]  simply  because  the  points  lying  too  close  to  each 
other  carry  redundant  information  and  we  have  at  most  n — 1 clusters  of  such  points. 
Therefore,  we  should  adjust  the  polynomial  degree  to  the  given  data  paying  attention  to 
the  trade-off  between  higher  approximation  power  of  higher  degree  polynomials  and  the 
“pollution”  caused  by  the  factor  qZln  that  increases  with  n.  In  practice  one  may  choose 
maximal  n such  that  h/qE,n  is  smaller  than  a prescribed  tolerance  value  0 < E < oo. 

4 Multivariate  polynomials 

The  situation  becomes  substantially  more  complicated  when  we  turn  to  multivariate 
polynomials.  Let  fl  be  a bounded  domain  in  ]Rd  and  let  {pi,  ■ ■ ■ ,pn},  n ~ be 

a basis  of  the  space  V = of  polynomials  in  d variables  of  total  degree  q satisfying 
(2.1)  on  51.  (For  example,  we  may  consider  a properly  scaled  standard  power  basis  with 
the  center  at  a point  in  51  or  the  Bernstein-Bezier  basis  with  respect  to  some  simplex 
overlapping  51  or  a significant  part  of  it.)  Let,  furthermore,  S be  an  arbitrary  finite  set 
of  points  in  51  such  that  m = > n. 

The  first  problem  we  face  in  the  case  d > 2 is  that  the  matrix  PE  may  be  rank  deficient. 
It  is  clear,  however,  that  there  is  no  practical  difference  between  this  situation  and  the 
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one  when  P=  has  full  rank  but  is  extremely  ill-conditioned,  he.,  cmin(PE)  is  very  small. 
Moreover,  (2.8)  shows  that  even  moderately  small  CTmin(-PH)  may  significantly  reduce 
the  approximation  power  of  £?>=;.  Clearly,  the  same  can  also  happen  in  the  univariate 
case  if  is  too  small.  The  real  difficulty  of  the  multivariate  case  seems  to  be  that 
simple  characteristics  of  E,  like  separation  distance  qs,m  do  not  give  much  information 
about  the  norm  of  L-p  =.  For  example,  six  equidistant  points  on  the  unit  circle  in  R2 
are  well  separated  and  look  reasonably  distributed.  However,  they  are  not  good  for  least 
squares  approximation  from  the  space  n2  since  the  matrix  P=  is  rank  deficient.  Suitably 
perturbed,  these  points  will  give  rise  to  the  least  squares  operator  Ln^E  with  a very 

large  norm.  More  generally,  the  norm  of  Lnd  >H  will  be  large  if  the  points  in  E C Rrf  lie 
“too  close”  to  an  algebraic  hypersurface  of  order  q. 

If  the  data  are  comparatively  dense  in  Cl,  namely  the  fill  distance 

hs,n  sup  min  \x  — £| 

Cch 

does  not  exceed  some  small  positive  constant  depending  on  ft  and  the  polynomial  degree, 
then  the  estimates  of  the  norming  constant  n(n^,E)  given  in  [9]  provide  a bound  for 
||I/nrf|n,Ell>  in  view  of  (2.6).  For  example,  if  Cl  is  a ball  of  radius  r,  then  n(n^|n,S)  > 1/2 
if  hs,Q  < 0.11  r/q2. 

On  the  other  hand,  without  any  density  assumptions  we  can  always  rely  on  (2.8), 
where  o’min(F,H)  can  be  efficiently  computed  by  well  known  algorithms  of  numerical  lin- 
ear algebra.  In  some  sense,  small  crm;n(P=)  indicates  that  the  local  data  has  “hidden 
redundancies”  ( e.g . too  many  points  lying  very  close  to  the  same  straight  line  or  the 
same  ellipse)  that  prevent  it  from  carrying  enough  information  for  a “full  power”  ap- 
proximation of  the  underlying  function  from  n^.  Similar  to  the  univariate  case,  but  using 
crra;n  (P=)  instead  of  q=,n,  we  can  adaptively  choose  the  polynomial  degree  according  to 
the  following  algorithm  that  has  proven  to  be  useful  for  scattered  data  fitting  [3,  5]. 

Let  Cl  C R , E C Cl,  #S  = m.  Denote  by  PJ  the  matrix  of  the  evaluations  of 
appropriate  basis  functions  for  TY^,  q > 0,  at  the  points  (£E. 

Algorithm  4.1  Starting  with  some  q = qo  > 0 such  that  < m,  compute  <Tmin  (Pi). 
If  l/crmin(P|)  is  smaller  than  a prescribed  tolerance  E < oo,  then  compute  the  least 
squares  Tl^- approximation  to  the  data  in  E and  accept  it  as  a reliable  approximation  on 
Cl.  Otherwise,  repeat  the  same  with  q = qo  — 1 and  successively  reduce  the  degree  q to 
qo  — 2, ...  ,0,  while  l/o'min(F,|)  > E.  For  q — 0 no  comparison  of  l/crm\n{P~)  with  E is 
needed  since  ||Lng|n,sll  *s  bounded  for  any  S2  and  E. 

Note  that,  optionally,  the  condition  number  ||P|||2/(Train(-PE)  °f  can  use<^  ™ 
the  above  algorithm  instead  of  l/<rmin(P|),  as  it  has  been  formulated  in  [5]. 
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